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Abstract. We show that the problem of computing the hybridization number of two rooted 
binary phylogcnetic trees on the same set of taxa X has a constant factor polynomial-time 
approximation if and only if the problem of computing a minimum-size feedback vertex set 
in a directed graph (DFVS) has a constant factor polynomial-time approximation. The latter 
problem, which asks for a minimum number of vertices to be removed from a directed graph to 
transform it into a directed acyclic graph, is one of the problems in Karp's seminal 1972 list of 
21 NP-complete problems. However, despite considerable attention from the combinatorial opti- 
mization community it remains to this day unknown whether a constant factor polynomial-time 
approximation exists for DFVS. Our result thus places the (in)approximability of hybridization 
number in a much broader complexity context, and as a consequence we obtain that hybridiza- 
tion number inherits inapproximability results from the problem Vertex Cover. On the positive 
side, we use results from the DFVS literature to give an 0(log r log log r) approximation for 
hybridization number, where r is the value of an optimal solution to the hybridization number 
problem. 



1. Introduction 



The traditional model for representing the evolution of a set of species X (or, more generally, a 
set of taxa) is the rooted phylogenetic tree [THl [T71 [M] . Essentially, this is a rooted tree where 
the leaves are bijectively labelled by X and the edges are directed away from the unique root. A 
binary rooted phylogenetic tree carries the additional restriction that the root has indegree zero 
and outdegree two, leaves have indegree one and outdegree zero, and all other (internal) vertices 
have indegree one and outdegree two. Rooted binary phylogenetic trees will have a central role in 
this article. 

In recent years there has been a growing interest in extending the phylogenetic tree model to 
also incorporate non-treelike evolutionary phenomena such as hybridizations, recombinations and 
horizontal gene transfers. This has stimulated research into rooted phylogenetic networks which 
generalize rooted phylogenetic trees by also permitting vertices with indegree two or higher, called 
reticulation vertices, or simply reticulations. For detailed background information on phylogenetic 
networks we refer the reader to EE ED EH E31 EH [33] . In a rooted binary phylogenetic network 
the reticulation vertices are all indegree two and outdegree one (and all other vertices obey the 
usual restrictions of a rooted binary phylogenetic tree). 

Informally, we say that a phylogenetic network Af on X displays a phylogenetic tree T on X if 
it is possible to delete all but one incoming edge of each reticulation vertex of N such that, after 
subsequently suppressing vertices which have indegree and outdegree both equal to one, the tree 
T is obtained (see Figure [T]). Following the publication of several seminal articles in 2004-5 (e.g. 
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Figure 1. Two phylogenetic trees, T and T' , an acyclic agreement forest T 
for T and T' and a hybridization network % that displays T and T', and has 
hybridization number 2. All edges are directed away from the root p. Forest T 
can be obtained from either of T and T' by deleting the dashed edges. Bold edges 
are used in H to illustrate that this network displays T ■ 

[21 E]) there has been considerable research interest in the following biologically- inspired question. 
Given two rooted, binary phylogenetic trees T and T' on the same set of taxa X, what is the 
minimum number of reticulations required by a phylogenetic network Af on X which displays 
both T and T'l This value is often called the hybridization number in the literature, and when 
addressing this specific problem the term hybridization network is often used instead of the more 
general term phylogenetic network. For the purpose of consistency we will henceforth use the term 
hybridization network in this article. 

MinimumHybridization, the problem of computing the hybridization number, has been shown 
to be both NP-hard and APX-hard j7j , from which several related phylogenetic network construc- 
tion techniques also inherit hardness [231 HZ] ■ APX-hardness means that there exists a constant 
c > 1 such that the existence of a polynomial-time approximation algorithm that achieves an 
approximation ratio strictly smaller than c would imply P=NP. As is often the case with APX- 
hardness results, the value c given in [7] is very small, f^yf- It is not known whether MinimumHy- 
bridization is actually in APX, the class of problems for which polynomial-time approximation 
algorithms exist that achieve a constant approximation ratio. In fact, there are to date no non- 
trivial polynomial-time approximation algorithms, constant factor or otherwise, for Hybridiza- 
tion number. This omission stands in stark contrast to other positive results, which we now 
discuss briefly. 

On the fixed parameter tractability (FPT) front - we refer to [131 EE E3 ISO] for an introduction - 
a variety of increasingly sophisticated algorithms have been developed. These show that for 
many practical instances of MinimumHybridization the problem can be efficiently solved (to 
the extent that even enumeration of all optimum solutions is often, in practice, tractable) [4, 6, 9, 
flOl 132"! l36l 138] . Secondly, the problem of computing the rooted subtree prune and regraft (rSPR) 
distance, which bears at least a superficial similarity to the computation of hybridization number, 
permits a polynomial-time 3-approximation algorithm [5l 1311 1361 and efficient FPT algorithms 
[3H1 [37] . Why then is it so difficult to give formal performace guarantees for approximating 
MinimumHybridization? 

A clue lies in the nature of the abstraction that (with very few exceptions) is used to compute 
hybridization numbers, the Maximum Acyclic Agreement Forest (MAAF), introduced in [5] (see 
Figure [lj. Roughly speaking, computing the hybridization number of two trees T and T' is 
essentially identical to the problem of cutting T and T' into as few vertex-disjoint subtrees as 
possible such that (i) the subtrees of T are isomorphic to the subtrees of T' and - critically 
- (ii) a specific "reachability" relation on these subtrees is acyclic. Condition (ii) seems to be 
the core of the issue, because without this condition the problem would be no different to the 
problem of computing the rSPR distance, which as previously mentioned seems to be comparatively 
tractable. (Note that the hybridization number of two trees can in general be much larger than 
their rSPR distance) . The various FPT algorithms for computing hybridization number deal with 
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the unwanted cycles in the reachability relation in a variety of ways but all resort to some kind of 
brute force analysis to optimally avoid (e.g. [35]) or break (e.g. [H1I3S]) them. 

In this article we demonstrate why it is so difficult to deal with the cycles. It turns out that 
MinimumHybridization is, in an approximability sense, a close relative of the problem Feedback 
Vertex Set on directed graphs (DFVS). In this problem we wish to remove a minimum number 
of vertices from a directed graph to transform it into a directed acyclic graph. DFVS is one of 
the original NP-complete problems (it is in Karp's famous 1972 list of 21 NP-complctc problems 
25j) and is also known to be APX-hard [24]. However, despite almost forty years of attention 
it is still unknown whether DFVS permits a constant approximation ratio i.e. whether it is in 
APX. (The undirected variant of FVS, in contrast, appears to be significantly more tractable. It 
is 2-approximable even in the weighted case pQ). 

By coupling the approximability of MinimumHybridization to DFVS we show that Mini- 
mumHybridization is just as hard as a problem that has so far eluded the entire combinatorial 
optimization community. Specifically, we show that for every constant c > 1 and every e > 
the existence of a polynomial-time c-approximation for MinimumHybridization would imply a 
polynomial-time (c + e)-approximation for DFVS. In the other direction we show that, for every 
c > 1, the existence of a polynomial-time c-approximation for DFVS would imply a polynomial- 
time 6c-approximation for MinimumHybridization. In other words: DFVS is in APX if and 
only if MinimumHybridization is in APX. Hence a constant factor approximation algorithm for 
either algorithm would be a major breakthrough in theoretical computer science. 

There are several interesting spin-off consequences of this result, both negative and positive. 
On the negative side, it is known that there is a very simple parsimonious reduction from the 
classical problem Vertex Cover to DFVS [25]. Consequently, a c-approximation for DFVS entails 
a c-approximation for Vertex Cover, for every c > 1. For c < 10\/5 — 21 s» 1.3606 there cannot exist 
a polynomial-time c-approximation of Vertex Cover, assuming P ^ NP [TTJ [12] . Also, if the Unique 
Games Conjecture is true then for c < 2 there cannot exist a polynomial-time c-approximation of 
Vertex Cover [28 . (Whether Vertex Cover permits a constant factor approximation ratio strictly 
smaller than 2 is a long-standing open problem). The main result in this article hence not only 
shows that MinimumHybridization is in APX if and only if DFVS is in APX, but also that 
MinimumHybridization cannot be approximated within a factor of 1.3606, unless P=NP (and 
not within a factor smaller than 2 if the Unique Games Conjecture is true). This improves 
significantly on the current APX- hardness threshhold of §jj§. 

On the positive side, we observe that already-existing approximation algorithms for DFVS can 
be utilised to give asymptotically comparable approximation ratios for MinimumHybridization. 
To date the best polynomial-time approximation algorithms for DFVS achieve an approximation 
ratio of O(min{log n log log n, log t* log log r* }), where n is the number of vertices in the graph and 
t* is the optimal fractional solution of the problem (taking the weights of the vertices into account) 
[HIES]. We show that this algorithm can be used to give an 0(logr log log r)-approximation 
algorithm for MinimumHybridization, where r is the hybridization number of the two input 
trees. To the best of our knowledge, this is the first non-trivial polynomial-time approximation 
algorithm for MinimumHybridization. 

The main result also has interesting consequences for the fixed parameter tractability of Min- 
imumHybridization. The inflation factor of 6 in the reduction from DFVS to MinimumHy- 
bridization is very closely linked to a reduction described in [BJ. The authors in that article 
showed that the input trees can be reduced to produce a weighted instance containing at most 
14r taxa. (The fact that the reduced instance is weighted means it cannot be automatically used 
to obtain a constant-factor approximation algorithm). In this article we sharpen their analysis to 
show that the reduction they describe actually produces a weighted instance with at most 9r taxa. 
Without this sharpening, the inflation factor we obtain would have been higher than 6. From 
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this analysis it becomes clear that the kernel size has an important role to play in analysing the 
approximability of MinimumHybridization. 

This raises some interesting general questions about the linkages between MinimumHybridiza- 
tion and DFVS. For example, it can be shown that, in a formal sense, a small modification to the 
reduction described in [5] produces a kernel (without weights) of quadratic size. This contrasts 
sharply with DFVS. It is known that DFVS is fixed parameter tractable [5], but it is not known 
whether DFVS permits a polynomial-size kernel. Might MinimumHybridization give us new 
insights into the structure of DFVS (and vice-versa)? More generally: within which complexity 
frameworks is one of the two problems strictly harder than the other? 

The structure of this article is as follows. In the next section, we define the considered problems 
formally and describe the reductions that were used to show that MinimumHybridization is 
fixed parameter tractable. In Section [3j we show an improved bound on the sizes of reduced 
instances. Subsequently, we use these results to show an approximation-preserving reduction from 
MinimumHybridization to DFVS in Section[4]and an approximation-preserving reduction from 
DFVS to MinimumHybridization in Sectional 



2. Preliminaries 

Phylogenetic Trees. Throughout the paper, let X be a finite set of taxa (taxonomic units). 
A rooted binary phylogenetic X-tree T is a rooted tree whose root has degree two, whose interior 
vertices have degree three and whose leaves are bijectively labelled by the elements of X. The 
edges of the tree can be seen as being directed away from the root. The set of leaves of T is denoted 
as £(T). We identify each leaf with its label. We sometimes call a rooted binary phylogenetic 
V-tree a tree for short. 

In the course of this paper, different types of subtrees play an important role. Let T be a rooted 
phylogenetic V-tree and X' a subset of X. The minimal rooted subtree of T that connects all 
leaves in X' is denoted by T(X'). Furthermore, the tree obtained from T(X') by suppressing all 
non-root degree-2 vertices is the restriction ofT to X' and is denoted by T\X' . Lastly, a subtree 
of T is pendant if it can be detached from T by deleting a single edge. 

Hybridization Networks. A hybridization network H on a set X is a rooted acyclic directed 
graph, which has a single root of outdegree at least 2, has no vertices with indegree and outdegree 
both 1, and in which the vertices of outdegree are bijectively labelled by the elements of X. A 
hybridization network is binary if all vertices have indegree and outdegree at most 2 and every 
vertex with indegree 2 has outdegree 1. 

For each vertex vofH, we denote by d~(v) and d + (v) its indegree and outdegree respectively. 
If (u, v) is an arc of Ji, we say that u is a parent of v and that v is a child of u. Furthermore, if 
there is a directed path from a vertex u to a vertex v, we say that u is an ancestor of v and that v 
is a descendant of u. 

A vertex of indegree greater than one represents an evolutionary event in which lineages com- 
bined, such as a hybridization, recombination or horizontal gene transfer event. We call these 
vertices hybridization vertices. To quantify the number of hybridization events, the hybridization 
number of a hybridization network % with root p is given by 

h{U) = Y J {d-{v)-l). 



Observe that h(H) — if and only if % is a tree. 
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Let H be a hybridization network on X and T a rooted binary phylogenetic X'-tree with 
X' CX. We say that T is displayed by T~L if T can be obtained from "H by deleting vertices and 
edges and suppressing vertices with d + (v) — d~(v) — 1 (or, in other words, if a subdivision of T 
is a subgraph of H). Intuitively, if % displays T, then all of the ancestral relationships visualized 
by T are visualized by %. 

The problem MinimumHybridization is to compute the hybridization number of two rooted 
binary phylogenetic X-trees T and T', which is defined as 

h(T,T') = min{h(H) : H is a hybridization network that displays T and T'}, 

i.e., the minimum number of hybridization events necessary to display two rooted binary phylo- 
genetic trees. 

This problem can be formulated as an optimization problem as follows. 
Problem: MinimumHybridization 

Instance: Two rooted binary phylogenetic X-trees T and T' ■ 
Solution: A hybridization network H that displays T and T' ■ 
Objective: Minimize h(H). 

If H is a hybridization network that displays T and T 1 , then there also exists a binary hy- 
bridization network H' that displays T and T' such that h(H) — h(W) Lemma 3]. Hence, we 
restrict our analysis to binary hybridization networks and will not emphasize again that we only 
deal with this kind of network. 

Agreement Forests. A useful characterization of MinimumHybridization in terms of agree- 
ment forests was discovered by Baroni et al. [5] , building on an idea in [TU] . Bordewich and Semple 
used this characterization to show that MinimumHybridization is NP-hard. Such agreement 
forests play a fundamental role in this paper. For the purpose of the upcoming definition and, in 
fact, much of the paper, we regard the root of a tree T (or network %) as a vertex p at the end of 
a pendant edge adjoined to the original root. Furthermore, we view p as an element of the label 
set of T ; thus £(T) = XU {p}. 

Let T and T' be two rooted binary phylogenetic X-trees. A partition T — {C pi C±, £%,... , Ck} 
of X U {p} is an agreement forest for T and T' if p £ C p and the following conditions are satisfied: 

(1) for all i e {p, 1, 2, . . . , fc}, we have T|A = T'|A, and 

(2) the trees in {T(A) : i € {p, 1, 2, . . . , fc}} and {T'(Ci) : i £ {p, 1, 2, ... , k}} are vertex- 
disjoint subtrees of T and T' , respectively. 

In the definition above, the notation = is used to denote a graph isomorphism that preserves 
leaf-labels. 

Note that, even though an agreement forest is formally defined as a partition of the leaves, we 
often see the collection of trees {T\C P ,T\C\, . . . ,T\Ck} as the agreement forest. So, intuitively, 
an agreement forest for T and T' can be seen as a collection of trees that can be obtained from 
either of T and T' by deleting a set of edges and subsequently "cleaning up" by deleting unlabelled 
vertices and suppressing indegree-1 outdegree-1 vertices (see Figure [2]). Therefore, we often refer 
to the elements of an agreement forest as components. 

The size of an agreement forest T is defined as its number of elements (components) and is 
denoted by 

A characterization of the hybridization number h(T,T') in terms of agreement forests requires 
an additional condition. Let J- = {C p , C\, £2, . . . be an agreement forest for T and T' ■ Let 
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Gjr be the directed graph that has vertex set T and an edge (£,;, Cj) if and only if i ^ j and at 
least one of the two following conditions holds 

(1) the root of T(A) is an ancestor of the root of T(Cj) in T; 

(2) the root of T'(Ci) is an ancestor of the root of T'{Cj) in T' ■ 

The graph Gjr is called the inheritance graph associated with J- . We call J- an acyclic agreement 
forest for T and T 1 if Gjf has no directed cycles. If J- contains the smallest number of elements 
(components) over all acyclic agreement forests for T and T 1 , we say that J 7 is a maximum acyclic 
agreement forest for T and T' . Note that such a forest is called a maximum acyclic agreement 
forest, even though one minimizes the number of elements, because in some sense the "agreement" 
is maximized. (Also note that acyclic agreement forests were called good agreement forests in [2].) 

We define m a (T, T') to be the number of elements of a maximum acyclic agreement forest for T 
and T' minus one. Also the problem of computing m a (T,T') has an optimization counterpart: 

Problem: Maximum Acyclic Agreement Forest (MAAF) 
Instance: Two rooted binary phylogenetic X-trees T and T'. 
Solution: An acyclic agreement forest T for T and T' ■ 
Objective: Minimize \T\ — 1. 

We minimize — 1, rather than following [7], because \T\ — 1 corresponds to the number 
of edges one needs to remove from either of the input trees to obtain T (after "cleaning up" ) and 
because of the relation we describe below between this problem and MinimumHybridization. 
Nevertheless, it can be shown that, from an approximation perspective, it does not matter whether 
one minimizes \J-\ or \J-\ — 1 (which is not obvious). 

Theorem 1. [51 Theorem 2] Let T and T' be two rooted binary phylogenetic X -trees. Then 

h(T,T') = m a (T,T'). 

It is this characterization that was used by Bordewich and Semple [7] to show that MinimumHy- 
bridization is NP-hard. To show that also an approximation for one problem can be used to 
approximate the other problem, one needs the following slightly stronger result. 

Theorem 2. Let T and T' be two rooted binary phylogenetic X-trees. Then 

(i) from a hybridization network % that displays T and T' , one can construct in polynomial 

time an acyclic agreement forest T for T and T' such that \ J- \ — 1 < h(l-L) and 
(ii) from an acyclic agreement forest J- for T and T , one can construct in polynomial time a 
hybridization network % that displays T and T 1 such that h(%) < \T\ — 1. 

This result follows from the proof of 2, Theorem 2] using the observation above that we may 
assume that % is binary. 

We now formally introduce the last optimization problem discussed in this paper. A feedback 
vertex set (FVS) of a directed graph D is a subset of the vertices that contains at least one vertex 
from each directed cycle in D. Equivalently, a subset V' of the vertices of I? is a feedback vertex 
set if and only if removing V' from D gives a directed acyclic graph. The minimum feedback vertex 
set problem on directed graphs (DFVS) is defined as: given a directed graph D, find a feedback 
vertex set of D that has minimum size. 

Reductions and Fixed Parameter Tractability. After establishing the NP-hardness of 
MinimumHybridization, the same authors showed that this problem is also fixed parameter 
tractable [B]. They show how to reduce a pair of rooted binary phylogenetic X-trees T and T', 
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FIGURE 2. Two input trees T and T', an agreement forest F for T and T' and 
the inheritance graph Gj=. The trees have two common chains: (01,02) and 
(61,62,63). I n the agreement forest chain (01,02) is atomized while chain 
(61,62,63) survives. The agreement forest F is acyclic because Gjr is acyclic. 



such that the number of leaves of the reduced trees is bounded by l4h(T, T'), whence a brute-force 
algorithm can be used to solve the reduced instance, giving a fixed parameter tractable algorithm. 

To describe the reductions, we need some additional definitions. Let T be a rooted binary 
phylogenetic X-tree. For n > 2, an n-chain of T is an rt-tuplc (ai, 02, . . . , a n ) of elements of 
£(T) \ {p} such that the parent of ai is either the same as the parent of 02 or the parent of ai 
is a child of the parent of 02 and, for each i £ {2, 3, . . . , n — 1}, the parent of aj is a child of the 
parent of Oi+i; i.e., the subgraph induced by a\, a%, . . . , a n and their parents is a caterpillar (see 
Figure [2]). 

Now, let A = (ai, 02, . . . , a n ) be an n-chain that is common to two rooted binary phylogenetic 
X-trees T and T' with n > 2, and let T be an acyclic agreement forest for T and T'. We say 
that A survives in T if there exists an element in T that is a superset of {ai, 02, ... , a n }, while we 
say that A is atomized in T if each element in {ai, 02, . . . , a„} is a singleton in T (see Figure (2l). 
Furthermore, if T is a common pendant subtree of T and T' , then we say that T survives in T if 
there is an element of F that is a superset of the label set of T. 

The following lemma basically shows that we can reduce subtrees and chains. It differs slightly 
from the corresponding lemma in [B] because we consider approximations while Bordewich and 
Semple considered only optimal solutions in that paper. 

Lemma 1. Let J- be an acyclic agreement forest for two trees T and T 1 . Then there exists an 
acyclic agreement forest J-' for T and T 1 with \ J-'\ < \ J-\ such that 

1. every common pendant subtree of T and T' survives in T' and 

2. every common n-chain ofT and T' , with n > 3, either survives or is atomized in J-' . 
Moreover, J-' can be obtained from J- in polynomial time. 

Proof. Follows from the proof of [51 Lemma 3.1]. There are two differences with [5J Lemma 
3.1]. Firstly, our result is slightly simpler because we consider two unweighted trees T and T', 
while the authors of [5] allow the unreduced trees T and T' to already have weights on 2-chains. 
Secondly, [6, Lemma 3.1] only shows the result for optimal agreement forests. However, a careful 
analysis of the proof of [6j Lemma 3.1] shows that it can also be used to prove this lemma. □ 

We are now ready to formally describe the aforementioned tree reductions. Let T and T' be 
two rooted binary phylogenetic X-trees, P a set that is initially empty and w : P — > Z + a weight 
function on the elements in P. 

Subtree Reduction. Replace any maximal pendant subtree with at least two leaves that is 
common to T and T' by a single leaf with a new label. 
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Chain Reduction. Replace any maximal n-chain (01,02, . . . ,a n ), with n > 3, that is common 
to T and T' by a 2-chain with new labels a and b. Moreover, add a new element (a, b) with weight 
w(a, b) = n — 2 to P. 

Let S and S' be two rooted binary phylogenetic X'-trees that have been obtained from T and 
T' by first applying subtree reductions as often as possible and then applying chain reductions 
as often as possible. We call S and S' the reduced tree pair with respect to T and T' . Note 
that a reduced tree pair always has an associated set P that contains one element for each chain 
reduction applied. Note that S and S' are unambiguously defined (up to the choice of the new 
labels) because maximal common pendant subtrees do not overlap and maximal common chains 
do not overlap. Moreover, applications of the chain reduction can not create any new common 
pendant subtrees with at least two leaves. Hence, it is not necessary to apply subtree reductions 
again after the chain reductions. 

Recall that every common n-chain, with n > 3, either survives or is atomized (Lemma[T]). In S 
and S', such chains have been replaced by weighted 2-chains. Therefore, we are only interested 
in acyclic agreement forests for S and S' in which these weighted 2-chains either survive or are 
atomized. We therefore introduce a third notion of an agreement forest. Recall that P is the set 
of reduced (i.e. weighted) 2-chains. We say that an agreement forest T for S and S' is legitimate 
if it is acyclic and every chain (a, b) € P either survives or is atomized in J- ' . 

Let T be an agreement forest for S and S' . The weight of J 7 , denoted by w(F), is defined to be 

w{T) = | J] - 1 + w(a,b). 

(a,b)£P: (a,b) is atomized in J- 

Lastly, we define f(S,S') to be the minimum weight of a legitimate agreement forest for S and 
S'. 

Then, the following lemma says that computing the hybridization number of T and T' is 
equivalent to computing the minimum weight of a legitimate agreement forest for S and S' . The 
second part of the lemma is necessary to show that an approximation to a reduced instance S and 
iS' can be used to obtain an approximation to the original instance T and T' . 

Lemma 2. Let T and T' be a pair of rooted binary phylogenetic X-trees and let S and S' be the 
reduced tree pair with respect to T and T' ■ Then 

(i) h(S,S') < f(S,S') = h(T,V) and 

(ii) given a legitimate agreement forest !Fs for S and S' , we can find, in polynomial time, an 
acyclic agreement forest T for T and T' such that \!F\ — 1 = w(!Fs). 

Proof. In part (i), the inequality follows directly from the definition of / while the equality is 
equivalent to [B] Proposition 3.2] if the unreduced trees T and T' are unweighted (i.e. if P is 
initially empty). Part (ii) follows from the proof of [SI Proposition 3.2]. □ 

The fixed parameter tractability of MinimumHybridization now follows from the next lemma, 
which bounds the number of leaves in a reduced tree pair. 

Lemma 3. [51 Lemma 3.3] Let T and T' be two rooted binary phylogenetic X-trees, S and S' the 
reduced tree pair with respect to T and T' , and X' the label set of S and S' . If h(T,T') > 0, then 
\X'\<Uh(T,V). 

We show in Section [3] that the reduced trees have at most 9h(T,T') leaves. This improved bound 
will be important in the approximation-preserving reductions we give later in the paper. 
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3. AN IMPROVED BOUND ON THE SIZE OF REDUCED INSTANCES OF MlNIMUMHYBRIDIZATION 

We start with some definitions and an intermediate result. The bound on the size of the reduced 
instance will be proven in Theorem [3j 

An r- reticulation generator (for short, r-generator) is defined to be a directed acyclic multigraph 
with a single vertex of indegree and outdegree 1, precisely r reticulation vertices (indegree 2 and 
outdegree at most 1), and apart from that only vertices of indegree 1 and outdegree 2 [23]. The 
sides of an r-generator are defined as the union of its edges (the edge sides) and its vertices of 
indegree-2 and outdegree-0 (the node sides). Adding a set of labels L to an edge side (u,v) of an 
r-generator involves subdividing (u, v) to a path of \L\ internal vertices and, for each such internal 
vertex w, adding a new leaf w' , an edge (w,w'), and labeling w' with some taxon from L (such 
that L bijectively labels the new leaves). On the other hand, adding a label I to a node side v 
consists of adding a new leaf y, an edge (v, y) and labeling y with /. 

Lemma 4. Let T and T' be two rooted binary phylogenetic X-trees with no common pendant 
subtrees with at least 2 leaves and let Ti be a hybridization network that displays T and T' with a 
minimum number of hybridization vertices. Then the network Ti' obtained from Ti by deleting all 
leaves and suppressing each resulting vertex v with d + (v) = d~(v) = 1 is an h(Ti)- generator. 

Proof. By construction, Ti' contains the same number of hybridization vertices as Ti. Additionally, 
by the definition of a binary hybridization network, no vertex has indegree 2 and outdegree greater 
than 1, indegree greater than 2, or indegree and outdegree both 1. Now, we claim that Ti' does 
not have any vertex with indegree 1 and outdegree 0. To see that this holds, suppose that there 
exists a vertex v in Ti' such that d~(v) = 1 and d + (v) = 0. Then v has two children in Ti. Since 
d + (v) = in T-L', no hybridization vertex can be reached by a directed path from v in W. This 
means that the subnetwork of Ti rooted at v is actually a rooted tree, contradicting the fact that T 
and T' do not have any common pendant subtree with two or more leaves. We may thus conclude 
that Ti' conforms to the definition of an /i("H)-generator. □ 

Reversely, by inverting the operations of suppression and deletion, Ti can be obtained from the 
/i(%)-generator Ti' associated with Ti by adding leaves to its sides (in the sense described at the 
start of this section) Q 

Theorem 3. Let T and T' be two rooted binary phylogenetic X-trees and S and S' the reduced 
tree pair on X' with respect to T and T' . If h(T,T') > 0, then \X'\ < 9/i(T, T'). 

Proof. Let Ti' be the /i("H)-generator that is associated with a hybridization network Ti for S and 
S' whose number of hybridization vertices is minimized, i.e., h(Ti) — h(S,S'). By definition, Ti' 
has the following vertices: 

• r = h(Ti) reticulations; in particular ro reticulations with indegree 2 and outdegree and 
r\ reticulations with indegree 2 and outdegree 1, 

• s vertices with indegree 1 and outdegree 2, and 

• one root vertex with indegree and outdegree 1. 

The total indegree of T-L' is 2r + 2^ + s. The total outdegree of T-L' is r\ + 2s + 1. Hence, 
2ro + 2ri + s = ri + 2s + 1 implying s — 2ro + r*i — 1. Moreover, the total number of edges of T-L', 
\E(Ti')\, equals the total indegree and, therefore, 

(1) \E{Ti')\ = 2r + 2n + s = 2r + 2r a + 2r + r x - 1 = 4r + 3r x - 1. 



A similar technique was described in I26| in a somewhat different context. 
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Note that for each of the Vq node sides v in H' the child of v in H is a single leaf. Moreover, 
each edge side in %' cannot correspond to a directed path in "H that consists of more than three 
edges since, otherwise, S and S' would have a common n-chain, with n > 3. Thus, H can have at 
most two leaves per edge side of H' and one leaf per node side of H'. Thus, the total number of 
leaves of H is bounded by 

\X'\<2\E(H')\+r 

= 2(4r + 3n - 1) + r 
= 9r + 6ri - 2 

< 9r - 2 

< 9h(S,S') 
<9h(T,T'), 

where the last inequality follows from Lemma [2j □ 

4. AN APPROXIMATION-PRESERVING REDUCTION FROM MlNIMUMrlYBRIDIZATION TO DFVS 

We start by proving the following theorem, which refers to wDFVS, the weighted variant of 
DFVS where every vertex is attributed a weight and the weight of a feedback vertex set is simply 
the sum of the weights of its constituent vertices. Later in the section we will prove a corresponding 
result for DFVS. 

Theorem 4. //, for some c > 1, there exists a polynomial-time c- approximation for wDFVS, 
then there exists a polynomial-time 6c- approximation for MinimumHybridization. 

Throughout this section, let T and T' be two rooted binary phylogenetic V-trees, and let S 
and S' be the reduced tree pair on X' with respect to T and T'. Using Lemma [T] we assume 
throughout this section without loss of generality that T and T' do not contain any common 
pendant subtrees with at least two leaves. Thus, the reduced tree pair S and S' can be obtained 
from T and T' by applying the chain reduction only. 

Before starting the proof, we need some additional definitions and lemmas. We say that a 
common chain (a, b) of S and <S' is a reduced chain if it is not a common chain of T an T' . 
Otherwise, (a, b) is an unreduced chain. Furthermore, a taxon I £ X' U {p}, is a non-chain taxon 
if it does not label a leaf of a reduced or unreduced chain of S and S' . Now, let Bs be the forest 
that exactly contains the following elements: 

(1) for each non-chain taxon i of S and S' , a non-chain element {£}, and 

(2) for each reduced and unreduced chain (a, b) of S and 5', an element {a, b}. 

Clearly, Bs is an agreement forest for S and 5', and we refer to it as a chain forest for S and S' . 
Now, obtain Bj- from Bs by replacing each element in Bs that contains two labels of a reduced 
chain, say (a, 6), of S and S' with the label set that precisely contains all labels of the common 
n-chain that has been reduced to (a, b) in the course of obtaining S and S' from T and T', 
respectively. The set Bj- is an agreement forest for T and T 1 , and we refer to it as a chain forest 
for T and T'. Since the chain reduction can be performed in polynomial time [6], the chain forests 
Bs and Bj- can also be calculated in polynomial time from T and T' . Lastly, each element in Bj- 
whose members label the leaves of a common n-chain in T and T' with n > 2 is referred to as a 
chain element. 

The next lemma bounds the number of elements in a chain forest. 
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Lemma 5. Let T and T 1 be two rooted binary phylogenetic X-trees. Let S and S' be the reduced 
tree pair with respect to T and T 1 . Furthermore, let Bs and Bj- be the chain forests for S and S' , 
and T and T' , respectively. Then \Br\ = \Bs\ < 5h(T,T'). 

Proof. By construction of Bj- from Bs, it immediately follows that \B-j-\ — \Bs\- To show that 
\Bs\ < 5h(T,T') let H be a hybridization network that displays S and S' such that its number 
of hybridization vertices is minimized over all such networks. Furthermore, let TC be the 
generator associated with TL. As in the proof of Theorem|3j let r be the number of node sides, i.e. 
reticulations with indegree 2 and outdegree 0, in TL' and let r\ be the number of reticulations in 
V! with indegree 2 and outdegree 1. Again, r + T\ = hiji') = h(S,S'). Recall that, to obtain TL 
from TL' , we add one leaf to each node side of H' , corresponding to a singleton in Bs, and at most 
two leaves to each edge side of TL' . Each edge side of TL' to which we add two taxa corresponds to 
a 2-chain of S and S' and, therefore, to a single element in Bs- Hence, using and Lemma [2j 
we have 

\Bt\ = \B S \ < \E(H')\+r = 5r + 3n - 1 < 5(r + n) = 5h(S,S') < hh{T,T). 

□ 

Consider again the chain forest Br for T and T'. We define a Br -splitting as an acyclic 
agreement forest for T and T' that can be obtained from Br by repeated replacements of a chain 
element {aj., a%, ■ . ■ , a n } with the elements {a\}, {0,2}, . . . , {a n }. 

Lemma 6. Let Br be the chain forest for two rooted binary phylogenetic X-trees T and T' . Let 
{a\, a 2 , • ■ • , a n } be a chain element in Br, and let Cj be a non-chain element in Br- Furthermore, 
let B' T = {B T - {{ai,a 2 , . . . ,a„}}) U {{a{\, {a 2 }, ... , {a„}}. Then 

(1) no directed cycle of Gg< passes through an element of {{ai}, {02}, ■ ■ • , {flu}} and 

(2) no directed cycle of Gg r passes through Cj. 

Proof. By the definition of Br, note that \Cj\ = 1. If Cj = {p}, then the indegree of Cj is in 
Gb t - Otherwise, if Cj ^ {p}, then its clement labels a leaf of T and T' and, thus the outdegree of 
Cj is in Gs T . Furthermore, since each element in {{ai}, {02}, . . . , {a„}} also labels a leaf of T 
and T', the outdegree of the vertices ai,a 2 , . . . ,a n in Gb> t is 0. This establishes the lemma. □ 

Let OPT(,Bx-splitting) denote the size of a fif-splitting of smallest size. 

Lemma 7. Let T and T' be two rooted binary phylogenetic X-trees, and let Br be the chain forest 
forT andV. Then, OPT(S r -splitting) < 6h(T,V). 

Proof. Let Tr be a maximum acyclic agreement forest for T and T' . In this proof, we see an 
agreement forest as a collection of trees (see the remark below the definition in Section [2]). Thus, 
Tr can be obtained from T (or equivalently from T') by deleting an (|J-y| — l)-sized subset, say 
Ejr T , of the edges of T and cleaning up. Similarly, Br can be obtained from T (or equivalently 
from T') by deleting a (\Br\ — l)-sized subset, say Ejs t , and cleaning up. Now consider the forest 
B'j- obtained from T by removing the edge set Ejr T U E& T and cleaning up. 

We claim that B'j- is a Bq— splitting. To see this, first observe that B'j- is an acyclic agreement 
forest for T and T' because it can be obtained by removing edge set E-q t from Tr and cleaning 
up. Hence, to show that B'j- is a Z?7--splitting, it is left to show that it can be obtained from Br by 
repeated replacements of a caterpillar on {a\,a 2 , ■ ■ ■ , a n } by isolated vertices {a\}, {a 2 }, ■ ■ ■ , {a n }. 
By its definition, B'j- can be obtained from Br by removing edges and cleaning up. Thus, what 
is left to prove is that each chain either survives or is atomized. For n-chains with n > 3, this 
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Figure 3. Two input trees T and T', their auxiliary graph G (with weights 
between parentheses) and an acyclic agreement forest T of T and T' ■ Note that J- 
is a ^-splitting because it can be obtained from the chain forest Bt by atomizing 
chains a — (ai, . . . , (13) and b = (61, ... , 63). Also note that J- has 9 components, 
which is equal to the weight of a minimum feedback vertex set {a, b, c, d} of G, 8, 
plus a single non-chain taxon (in this case, p). 

follows from Lemma [lj and for n = 2 it is clear because B'j- can be obtained by removing edges 
from Bj- in which each 2-chain is a component on its own. 

As the size of B'j- is equal to the number of edges removed to obtain it from T plus one, we 
have: 

\B' T \ < \Ejr T \ + \Ebt\ + 1 = |7r| - 1 + l#rl < h(T, T') + 5h(T, T') = 6h(T,T'), 
where Lemma [5] is used to bound \Bt\. This establishes the lemma. □ 



We are now in a position to prove the main result of this section. 



Proof of Theorem^ Throughout this proof, let n > 2. Furthermore, let Bj- be the chain forest 
for T and T' , and let G be the graph obtained from the inheritance graph Gg r by subsequently 

(1) weighting each vertex that corresponds to a common n-chain (ai, a 2 , . . . , a„) of 1" and T' 
with weight n; 

(2) deleting each vertex that corresponds to a non-chain taxon in Bq-] and 

(3) for each remaining vertex v, creating a new vertex v with weight 1 and two new edges 
(v, v) and (v, v). 

Furthermore, let w be the weight function on the vertices of G. See Figure [3] for an example of the 
construction of G. We call the added vertices v the barred vertices of G. Note that each common 
n-chain of T and T' is represented by a vertex and its barred vertex in G. As Bj- can be calculated 
in polynomial time, the construction of G also takes polynomial time, and the size of G is clearly 
polynomial in the cardinality of B-j . 

Now, regarding G as an instance of wDFVS, we claim the following. 

Claim. There exists a £>7--splitting of size k + s, where s is the number of non-chain elements in 
Bf, if and only if G has a FVS of weight k. 
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Suppose that B'j is a fi^-splitting of size k + s. Hence, k is equal to the number of chain 
elements in Bj- that are also elements in B'j- plus the total number of leaves in common n-chains 
that are atomized in B'j-. Let B'j- be the forest that has been obtained from B'j- by deleting all 
singletons, and let Gg^ be its inheritance graph. Since Gg^ is acyclic, Gg^ is also acyclic. Now, 
let G' be the directed graph that has been obtained from G in the following way. For each non- 
barred vertex v in G, delete v if v corresponds to an n-chain of T and T that is atomized in B'j, 
and delete v if v corresponds to an n-chain of T and T' that is not atomized in B'j. Note that for 
each 2-cycle [v, v, v) of G either v or v is not a vertex of G' because each n-chain that is common 
to T and T' is either atomized or not in B'j. This in turn implies that G' is acyclic because Gg/^ 
is isomorphic to G'\V, where V precisely contains all barred vertices of G'. Hence, an FVS of G, 
say V, contains each vertex of G that is not a vertex of G' . Furthermore, by the weighting of G, 
it follows that the weight of V is exactly k. 

Conversely, suppose that there exists an FVS of G, say V, with weight k. This implies that 
we can remove a set V\ of barred vertices and a set V 2 — V\Vi of non-barred vertices such 
that J2 Vi ev 2 w ( v i) + l^il — ^ an d the graph G' — G\V is acyclic. For each vertex u 4 G V 2 , let 
M = (oi.i) a i,2, ■ ■ ■ j a<i,n) be its associated common chain of T and T', and let w(vi) be the number 
of elements in Aj. Furthermore, let V/ be the subset of Vy that contains precisely each vertex v 
of Vi for which v ^ V 2 . If IK I < ]Vij, then it is easily checked that that V{ U V 2 is an FVS of G 
whose weight is strictly less than k. Therefore, we may assume for the remainder of this proof 
that \V[\ — \Vi\. Now, let B' r be the forest that has been obtained from Bj in the following way. 
For each vertex in V 2 , replace A^ in Bj- with the elements {ai,i}, {^,2}, • ■ ■ , { a i,n}- Thus, Ai is 
atomized in B'j. We next construct the inheritance graph Gb' t from Ge r - For each vertex v of 
Gg r that corresponds to a common n-chain (01, ... , a„) of T and T' that is atomized in B'j, 
replace v with the vertices a 1 , a 2 , ■ ■ ■ , a„, delete each edge (v, iu) of Ge r , and replace each edge 
(u,v) of Ge r with the edges (u, ai), (u, a 2 ), ■ ■ ■ , (u,a n ). By Lemma[6j the vertices a\,a 2 , . . . ,a n 
have outdegree in Gb' t - Noting that there is a natural bijection between the cycles in Gg T and 
the cycles in G that do not pass through any barred vertex, it follows that, as G' is acyclic, Gg^ 
is also acyclic. Hence, B'j is a fix-splitting for T and T' ■ The claim now follows from 

\ B T\ = s + E w(Vi) + \Vl\=8 + k. 
Vi£V 2 

It remains to show that the reduction is approximation preserving. Suppose that there exists 
a polynomial-time c-approximation for wDFVS. Let k be the weight of a solution returned by 
this algorithm, and let k* be the weight of an optimal solution. By the above claim, we can then 
construct a solution to MAAF of size k + s, from which we can obtain a solution to MinimumHy- 
bridization with value k + s — 1 by Theorem [2] We have, 

k + s - 1 < ck* + s < cfe* + cs = c(k* +s) = c ■ OPT(B r -splitting) 

and, thus, a constant factor c-approximation for finding an optimal fir-splitting. Now, by 
Lemma [7J 

k + s - 1 < c • OPT(fi r -splitting) < 6c • h(T, T'), 

thereby establishing that, if there exists a polynomial-time c-approximation for wDFVS, then 
there exists a polynomial-time 6c- approximation for MinimumHybridization. This concludes 
the proof of the theorem. □ 

It is not too difficult to extend Theorem [3] to DFVS i.e. the unweighted variant of directed 
feedback vertex set. 

Theorem 5. If, for some c > 1, there exists a polynomial-time c-approximation for DFVS, then 
there exists a polynomial-time 6c- approximation for MinimumHybridization. 
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Proof. In the proof of Theorem [4] we create an instance G of wDFVS. Let w be the weight 
function on the vertices of G. Note that the function is non-negative and integral and for every 
vertex v € G, w(v) < \X\ i.e. the weight function is polynomially bounded in the input size. We 
create an instance G' of DFVS as follows. For each vertex v in G we create w(v) vertices in G' 
Vi,... ,v w ( v y For each edge (u, v) in G we introduce edges {(ui,Vj)\l < i < w(u), 1 < j < w(v)} 
in G' . Solutions to wDFVS(G) and DFVS(G') are very closely r elat ed, which allows us to use 
G' and DFVS instead of G and wDFVS in the proof of Theorem 4^ Specifically, consider any 
feedback vertex set F' of G' of size k. We create a feedback vertex set F of G as follows. For each 
vertex v € G, we include v in F if and only if all the vertices V\, . . . , v w r v \ are in F' . Note that 
the weight of F is less than or equal to k. To see that F is a feedback vertex set, suppose some 
cycle C = u,v,w, . . . , it survives in G. But then, for each vertex u £ C, some vertex Ui survives in 
G' , which means a cycle also survived in G", contradicting the assumption that F' is a feedback 
vertex set. In the other direction, observe that any weight k feedback vertex set F of G can be 
transformed into an feedback vertex set F' of G' with size k as follows: for each ugF, place all 
Vi, ■ ■ -,v w ( v ) in F'. □ 



Moreover, the reduction in the proof of Theorem [4] can be used not only for constant c, which 
we use in the next corollary. 

Corollary 1. There exists a polynomial-time 0(logr log log r) -approximation for MinimumHy- 
BRIDIZATION, where r = h(T,T') 



Proof. In [14], which extended [35], a polynomial-time approximation algorithm for wDFVS is 
presented whose approximation ratio is 0(min(log \ V\ log log \V\, log r* log log r*)), where \ V\ is the 
number of vertices in the wDFVS instance and t* is the optimal fractional solution value of the 
problem. We show that in the wDFVS instance G that we create in the proof of Theorem [4] both 
the number of vertices in G and the weight of the optimal fractional solution value of wDFVS(G) 
are O(r). To see that G has at most O(r) vertices, observe that G contains two vertices for 
every chain element in the chain forest Bf, and that (by Lemma [5]) \B-j-\ < 5r. Secondly, recall 
from Lemma [7] that OPT {Bf- splitting) < 6r. By construction, OPT(S7--splitting) is an upper 
bound on the optimum solution value of wDFVS(G), hence on r*. Thus, given G as input, the 
algorithm in [13] constructs a feedback vertex set that is at most a factor O (log r log log r) larger 
than the true optimal solution of wDFVS(G). As shown in the proof of Theorem [4] this can be 
used to obtain an approximation ratio at most 6 times larger for MAAF, which is clearly also 
0(logr log log r). □ 



Finally, note that for a given instance the actual approximation ratio obtained by Corollary [T] will 
sometimes be determined by |V|, and sometimes by t*, and can potentially be significantly smaller 
than O (log r log log r). For example, if there are very few chains in the chain forest, but they are 
all extremely long, then it can happen that | | << r*. Conversely, if the chain forest contains 
many short chains, and only a small number of them need to be atomized to attain acyclicity, 
then it can happen that r* << \V\. 



5. AN APPROXIMATION-PRESERVING REDUCTION FROM DFVS TO MiNIMUMHYBRIDIZATION 



In this section we prove the following theorem. 

Theorem 6. //, for some c > 1, there exists a polynomial-time c- approximation algorithm for 
MiNIMUMHYBRIDIZATION, then there exists a polynomial-time (c + ^-approximation algorithm 
for DFVS for alle>0. 



Formally, what we demonstrate is an L-reduction from wDFVS to DFVS with coefficients a = /3 = 1 which 
works for instances with polynomially-bounded weights. 
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D D' 

FIGURE 4. An instance D of DFVS and the modified graph D' . 



Proof. We show an approximation preserving reduction from DFVS to MAAF. The theorem 
then follows because of the equivalence of MAAF and MinimumHybridization described in 
Theorem [2j 

Let D — (V, A) be an instance of DFVS. First we transform D into an auxiliary graph D' . 
For a vertex v of D, we denote the parents of v as U\,U2,-- - ,Ud-( v ) and the children of v as 
it>2, • • • , Wd+ ( v ) (To facilitate the exposition, we assume a total order on the parents of each 
vertex and on the children of each vertex.). We construct the graph D' as follows. For ev- 
ery vertex v G V, D' has vertices v"^ , , . . . , u in d M , vertices v~ and v + as well as vertices 
v outi v out> ■ ■ ■ i v out M ■ The edges of D' are as follows. For each vertex v £ V, D' has edges 
from each of U;" 1 , , . . . , w in d to v~, an edge from v~ to v + and edges from v + to each of 

v outi v out> ■ ■ ■ i u out +< "' ■ I n addition, for each edge (u, v) of D, there is an edge (Wout^hi) m D' . 
This concludes the construction of D' . An example is given in Figure [4] 

We now first show that D has a FVS of size at most / if and only if D' has a FVS of size 
at most /. Observe that each directed cycle of D corresponds to a directed cycle of D' and 
vice versa. Thus, from a FVS F of D, we can construct a FVS F' of D' by, for each v € F, 
adding v~ to F'. Reversely, from a FVS U' of D' , we can create a FVS U of D as fol- 
lows: a vertex v of £) is put in £/ if and only if at least one of the corresponding vertices 

v in ><)-••> V in > V > W »«out . U outi • • • > "out IS m U . 

Intuitively, the idea of our reduction is as follows. We will construct two rooted binary trees T 
and T' consisting of long chains. We build them in such a way that the graph D' is basically the 
inheritance graph of the chain forest for T and T'. This graph can be made acyclic by atomizing 
some of the chains. Thus, solving DFVS on D' is basically equivalent to deciding which chains to 
atomize. We make all the chains that can be atomized of the same length. Hence, since each chain 
that is atomized adds the same number of components to the agreement forest, solving DFVS 
on D' is essentially equivalent to finding a maximum acyclic agreement forest for T and T' . 

Before we proceed, we need some more definitions. Recall that an n-chain of a tree is an n-tuple 
(ai,<Z2, . . • , a n ) of leaves such that the parent of a\ is either the same as the parent of a-i or the 
parent of a\ is a child of the parent of a-i and, for each i € {2, 3, . . . ,n — 1}, the parent of a, is 
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x- chain 



y- chain 



z-chain 



Figure 5. T: the first tree of the constructed MAAF instance. 




Figure 6. T': the second tree of the constructed MAAF instance. 



a child of the parent of a,+i. A tree T whose leaf set C(T) is a chain of T is called a caterpillar 
on C(T). It is easy to see that, for every chain C, there exists a unique caterpillar on C. By 
hanging a chain C below a leaf x, we mean the following: subdivide the edge entering a; by a new 
vertex v and add an edge from v to the root of the caterpillar on C. When we hang a chain C\ 
below a chain C2, we hang the caterpillar on C\ below the lowest leaf (or a lowest leaf) x\ of €2- 
By replacing a leaf a; by a chain C we mean: delete x and add an edge from its former parent to 
the root of the caterpillar on C. 

We are now ready to construct an instance of MAAF. The trees, T and T', will be built of 
chains of three types: x-type, y-type and z-type. The x-type chains have length t while the y-type 
and z-type chains have length L (with L >> £). Each of these chains will be common to both trees. 
Recall that, by Lemma[TJ we may assume that every chain either survives or is atomized. The idea 
is that y-type chains and z-type chains are so long that they will all survive. The x-type chains 
are shorter and might be atomized. In fact, the x-type chains that are atomized will correspond 
to a FVS of D'. 
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We build the trees T and T' as follows. For each vertex of D' of the type v~ or v + we create 
an x-type chain. For each other vertex of D' we create a y-type chain. Finally, for each vertex 
and edge of the original graph D we create a z-type chain. All leaves of all chains have different 
labels. Now we combine the chains into two trees as follows. 

First T. Start with an arbitrary rooted binary tree on |V^| + \ A\ leaves and replace each leaf by 
a z-type chain. We call the current tree To- For each edge (u, v) of D, the tree contains a z-type 
chain. Hang below this z-type chain the y-type chain for u^ ut and below that the y-type chain 
for v" n . Furthermore, for each vertex v of D, the tree also has a z-type chain. Hang below this 
z-type chain the x-type chain for v~ and below that the x-type chain for v + . 

Now T' . Start with an arbitrary rooted binary tree on 2\V\ leaves. So we have two leaves 
for each vertex v of D. Replace one of them by a concatenation of (from top to bottom) the 
y-type chains for v"^ , v"^ , . . . , u in d w and the x-type chain for v~ . Replace the other leaf for v 
by a concatenation of (from top to bottom) the x-type chain for v + and the y-type chains for 
^out; u outi ■ ■ • j u out +< "'' • Finally, hang a copy of To below the root. This concludes the construction 
of the MAAF instance. For an example, see Figures [5] and [6] 

We claim that D' (and thus D) has a FVS of size at most / if and only if there exists an acyclic 
agreement forest of T and T' of size at most 1 + 2(|A| + | V^|) + (( — 1)/. 

To show this, consider the agreement forest Ajj for T and T' in which To is one component, 
each x-type chain is one component, and each y-type chain is one component. The inheritance 
graph Ga d of this agreement forest can be obtained by making some (minor) changes to D' . 
Add a vertex labelled % with edges to all other vertices. Secondly, for each v £ V, add an edge 
(v ^ , ) for each pair i,j with 1 < i < j < dr (v) and an edge (i>^ t , v^ t ) f° r each pair i,j with 
1 < i < j < d + (v). Observe that, given a FVS of D' , there exists a FVS of D' of at most the same 
size that consists of only vertices of the type v~ . Such a FVS is also a FVS of Ga d since any 
directed cycle passing through any of the newly added edges (v^,v^) or (v^ t , v^t) & ^ so passes 
through v~ . Thus, if we consider (without loss of generality) only FVSs consisting of v~-type 
vertices, then any FVS of D' is a FVS of Ga d and vice versa. In addition, since w~-type vertices 
correspond to x-type chains, it is possible to make Ga d acyclic by atomizing only x-type chains. 

Let F be a FVS of D and let F' (as before) be the corresponding FVS of D' that contains only 
vertices of the type v~ . Then we can construct an agreement forest 1Z of T and T' as follows. 
One component consists of the tree 7o- Each of the y-type chains is also one component, as well 
as the x-type chains that do not correspond to vertices in F' . Finally, for each other x-type chain 
(that does correspond to a vertex in F'), we create a separate component for each leaf. Thus, the 
number of components is 1 +2|A| + (2\V\ - \F'\) +£ \F'\ = 1 + 2(|A| + |V|) + (£- 1)\F\. We have to 
show that the inheritance graph G-r is acyclic. We can construct Gn from Ga d as follows. Delete 
every vertex v~ G F' and instead add a vertex for each leaf of the corresponding x-type chain 
with incoming edges from To and from , , . . . , w in d . Since we only introduced leaves with 
incoming edges, this modification does not create any directed cycles. Thus, since F' contains a 
vertex of each directed cycle of Ga d , and all vertices from F' have been removed, G-jz is acyclic. 
It follows that 1Z is an acyclic agreement forest for T and T' ■ 

To show the other direction, let A be an acyclic agreement forest of T and T' ■ We may assume 
that all y-type chains and z-type chains survive in A, since we can choose L sufficiently large. To 
see this, recall that we may assume by Lemma [T] that each chain either survives or is atomized. 
Hence, if a y-type chain or z-type chain does not survive, it is atomized and adds L components to 
the agreement forest. Thus, by choosing L large enough (as will be specified later) we can make 
sure that all y-type chains and z-type chains survive. Secondly, observe that we may in addition 
assume that all z-type chains are together in a single component (if they are not, we can put 
them together and reduce the number of components) . Now consider two chains that are not both 
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z-type chains. We show that these chains can not be together in a single component of A. Firstly, 
if the two chains are below each other in T, then they are next to each other in T' ■ Secondly, if 
the two chains are next to each other in T, then they are separated by a z-type chain in T but not 
in T'. Hence, by (2) in the definition of an agreement forest, the two chains can not be together 
in a single component of A. Thus, the components of A are as follows. Tree 7o is the component 
containing the root and all z-type chains. Furthermore, each y-type chain, each surviving x-type 
chain, and each leaf of a non-surviving x-type chain is a separate component. Let F be the set of 
vertices of Ga d corresponding to the non-surviving x-type chains. Thus, each vertex in F is of 
the type v~ or v + . We will show that F is a FVS of G a d and hence of D' . We can construct Ga 
from Ga d as follows. Remove each vertex in F from G a d and add each leaf of the corresponding 
x-type chain as a separate vertex. Then add edges to these newly added vertices (these edges 
are not important since they do not create any directed cycles). Since A is an acyclic agreement 
forest, Ga is acyclic and hence F is a FVS. The size |F| of the FVS is equal to the number of 
non-surviving x-typc chains. Thus, = 1 + 2|A| + (2|V|-|F|)+^|F| = l+2(\A\ + \V\) + (£-l)\F\. 

The reduction is clearly polynomial time. It remains to show that it is approximation preserving. 
Suppose that there exists a c- approximation algorithm for MAAF. Say that m is the size of the 
MAAF returned by this algorithm and m* the size of an optimal solution. Recall that MAAF 
minimizes the size of an agreement forest minus one, so m — 1 < c ■ (m* — 1). We have shown 
that D has a FVS of size at most / if and only if T and T' have an acylic agreement forest of 
size at most 1 + 2{\A\ + \V\) + (£ - 1)/. Thus, m* = I + 2(\A\ + \V\) + {£ - 1)/*. Moreover, an 
approximate solution / of DFVS can be computed from an approximate solution m of MAAF by 
taking / = (m - 1 - 2(\A\ + \V\))/(£ - 1). Then we have 

m-l-2(|A| + |+|) 
£-1 

c-(m*-l)-2(\A\ + \V\) 
£-1 

c(2(\A\ + \V\) + (£-l)r)-2(\A\ + \V\) 
£-1 
2( C -1)(|A| + |V|) 
1 + l-\ 

= c-r + i 



if we take I = 2(c — 1)(|A| + |V|) + 1. We still need to specify the value of L, which needs to 
be sufficiently large so that all y-type chains and z-type chains survive. Since any graph trivially 
has a FVS of size \V\, any constructed MAAF instance has m* < 1 + 2(\A\ + |V|) + (£ - 1)\V\. 
Thus, a c-approximation algorithm will return an acyclic agreement forest of size m with m — 1 < 
c(m* - 1) < c(2(|A| + |V|) + (£- 1)| + |). And hence with m < c(2(\A\ + |V|) + {£- l)\V\) + 1. So 
it suffices to take L = c(2(\A\ + |V|) + {£ - l)\V\) + 2 = 2c(\A\ + |V|)(1 + (c - 1)|V|) + 2. 

Now take e > 0. If /* < 1/e, we can compute an optimal solution for DFVS by brute force in 
polynomial time. Otherwise, 1 < e • /* and we have 

f<c-r+e-r = (c + e)f. 

Thus, if there exists a c-approximation for MAAF, then there exists a (c + e)-approximation 
for DFVS for every fixed e > 0. 




□ 
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In contrast to the result in Section [4j the reduction above can only be used for constant c. It 
does not show that e.g. an 0(log |V|)-approximation for MinimumHybridization would imply an 
0(log | ^-approximation for DFVS. Hence, it is indeed possible that MinimumHybridization 
admits an 0(log |X|)-approximation while DFVS does not admit an O(log | V|)-approximation. 
For neither of the problems such an approximation is known to exist. 

Finally, we note that Theorem [6] also allows us to improve upon the best- known inapproxima- 
bility result for MinimumHybridization. 

Corollary 2. There does not exist a polynomial-time c- approximation for MinimumHybridiza- 
tion, where c < 10v5 — 21 w 1.3606, unless P—NP. If the Unique Games Conjecture holds, then 
there does not exist a polynomial-time c- approximation for MinimumHybridization where c < 2. 

Proof. In [25] a simple reduction is shown from the problem Vertex Cover to the problem 
DFVS. Specifically, given an undirected graph G as input to Vertex Cover we create a directed 
graph G' by transforming each edge {u, v} in G into two directed edges (u,v), (v,u) in G' . It is 
easy to show that G' has a feedback vertex set of size k if and only if G has a vertex cover of size k. 
Consequently, any polynomial-time c-approximation algorithm for DFVS can be used to construct 
a polynomial-time c-approximation for Vertex Cover. The latter problem does not permit a 
polynomial-time c-approximation, for any c < lOv^ — 21 ss 1.3606, unless P=NP [HI [12]. Also, it 
has been shown that if the Unique Games Conjecture is true then no approximation better than 
2 is possible [2S]. Now, the proof of Theorem [6] shows that, if there exists a c-approximation for 
MinimumHybridization, then there exists a (c+e)-approximation for DFVS for every fixed e > 0. 
Hence the existence of a c-approximation for MinimumHybridization where c < 10-^/5 — 21 
(respectively, c < 2) would mean the existence of a c'-approximation for DFVS (and thus also for 
Vertex Cover) where d < Wy/E - 21 (respectively, c' < 2). □ 
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